Masader Form
Please make sure first that the dataset is not included in the catalogue https://arbml.github.io/masader/
Sign in to Google to save your progress. Learn more
Email *
Name of the dataset *
For example CALLHOME: Egyptian Arabic Speech Translation Corpus
Subsets
The different subsets in the dataset if it is broken by dialects. For example,  Algerian , 2000, sentences. Put every subset in a new line in the format subset-name, number of samples, type [tokens sentences, documents]
Link *
Direct link to the dataset repository
Huggingface Link
License *
Use shortcuts for example CC BY-SA 4.0,
Year *
Year of publishing the dataset/paper
Language *
Dialect *
used mixed if the dataset contains multiple dialects
Domain *
Form *
Collection Style *
Description *
brief description of the dataset
Volume *
How many samples are in the dataset, this is closely related to the unit option. As an example if the dataset has 10K tweets you put the Volume: 10,000 and the Unit: sentences. Please don't use 10K or any abbreviations.
Unit *
tokens usually used for ner, pos tagging, etc. sentences for sentiment analysis , documents for text modelling tasks
Ethical Risks
social media datasets are considered mid risks as they might release personal information, others might contain hate speech as well so considered as high risk.
Clear selection
Provider
Name of institution i.e. NYU Abu Dhabi
Derived From
If the dataset is extracted or collected from another dataset put the name of such dataset
Paper Title
Paper Link
Direct link to the pdf of the paper i.e. https://arxiv.org/pdf/2110.06744.pdf
Script *
Tokenized *
Is the dataset tokenized i.e الرجل = ال رجل
Host *
Where the data resides i.e. GitHub, GitLab, Kaggle, etc.
Access *
Cost
For example 1750 $
Test split *
Does the dataset have validation / test split.
Tasks *
If you choose "Other" use comma to separate multiple tasks , i.e. sarcasm detection, abusive language detection, etc.
Required
Venue Title
venue shortcut i.e. ACL
Citations
number of citations
Venue Type
Clear selection
Venue Name
Full name i.e Associations of computation linguistics
Authors
Add all authors split by comma
Affiliations
Abstract
abstract of the published paper
Added by *
put your full name in English
Notes
A copy of your responses will be emailed to the address you provided.
Submit
Clear form
Never submit passwords through Google Forms.
reCAPTCHA
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy